Energy efficiency of finite difference algorithms on multicore CPUs, GPUs, and Intel Xeon Phi processors
نویسندگان
چکیده
In addition to hardware wall-time restrictions commonly seen in high-performance computing systems, it is likely that future systems will also be constrained by energy budgets. In the present work, finite difference algorithms of varying computational and memory intensity are evaluated with respect to both energy efficiency and runtime on an Intel Ivy Bridge CPU node, an Intel Xeon Phi Knights Landing processor, and an NVIDIA Tesla K40c GPU. The conventional way of storing the discretised derivatives to global arrays for solution advancement is found to be inefficient in terms of energy consumption and runtime. In contrast, a class of algorithms in which the discretised derivatives are evaluated on-the-fly or stored as thread-/process-local variables (yielding high compute intensity) is optimal both with respect to energy consumption and runtime. On all three hardware architectures considered, a speed-up of ∼ 2 and an energy saving of ∼ 2 are observed for the high compute intensive algorithms compared to the memory intensive algorithm. The energy consumption is found to be proportional to runtime, irrespective of the power consumed and the GPU has an energy saving of ∼ 5 compared to the same algorithm on a CPU node.
منابع مشابه
First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC
Recent innovations focused around parallel processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA’s Tesla Graphic...
متن کاملImproving Main Memory Hash Joins on Intel Xeon Phi Processors: An Experimental Approach
Modern processor technologies have driven new designs and implementations in main-memory hash joins. Recently, Intel Many Integrated Core (MIC) co-processors (commonly known as Xeon Phi) embrace emerging x86 single-chip many-core techniques. Compared with contemporary multi-core CPUs, Xeon Phi has quite di↵erent architectural features: wider SIMD instructions, many cores and hardware contexts, ...
متن کاملPerformance Portability for Room Acoustics Simulations
Numerical modelling of the 3-D wave equation can result in very accurate virtual auralisation, at the expense of computational cost. Implementations targeting modern highly-parallel processors such as NVIDIA GPUs (Graphics Processing Units) are known to be very effective, but are tied to the specific hardware for which they are developed. In this paper, we investigate extending the portability ...
متن کاملPorting of the DBCSR Library for Sparse Matrix-Matrix Multiplications to Intel Xeon Phi Systems
Multiplication of two sparse matrices is a key operation in the simulation of the electronic structure of systems containing thousands of atoms and electrons. The highly optimized sparse linear algebra library DBCSR (Distributed Block Compressed Sparse Row) has been specifically designed to efficiently perform such sparse matrix-matrix multiplications. This library is the basic building block f...
متن کاملMatrix factorization routines on heterogeneous architectures
In this work we consider a method for parallelizing matrix factorization algorithms on systems with Intel © Xeon Phi TM coprocessors. We provide performance results of matrix factorization routines implementing this approach and available in Intel © Math Kernel Library (Intel MKL) on the Intel © Xeon © processor line with Intel Xeon Phi coprocessors. Summary New heterogeneous systems consisting...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.09713 شماره
صفحات -
تاریخ انتشار 2017